A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification
نویسندگان
چکیده
Kernel density estimation (KDE) is an important method in nonparametric learning. While KDE has been studied extensively in the context of accuracy of density estimation, it has not been studied extensively in the context of classification. This paper studies nine bandwidth selection schemes for kernel density estimation in Naive Bayesian classification context, using 52 machine learning benchmark datasets. The contributions of this paper are threefold. First, it shows that some commonly used and very sophisticated bandwidth selection schemes do not give good performance in Naive Bayes. Surprisingly, some very simple bandwidth selection schemes give statistically significantly better performance. Second, it shows that kernel density estimation can achieve statistically significantly better classification performance than a commonly used discretization method in Naive Bayes, but only when appropriate bandwidth selection schemes are applied. Third, this study gives bandwidth distribution patterns for the investigated bandwidth selection
منابع مشابه
Bayesian Approaches to Non-parametric Estimation of Densities on the Unit Interval
This paper investigates nonparametric estimation of density on [0,1]. The kernel estimator of density on [0,1] has been found to be sensitive to both bandwidth and kernel. This paper proposes a unified Bayesian framework for choosing both the bandwidth and kernel function. In a simulation study, the Bayesian bandwidth estimator performed better than others, and kernel estimators were sensitive ...
متن کاملAsymptotic properties of parallel Bayesian kernel density estimators
In this article we perform an asymptotic analysis of Bayesian parallel kernel density estimators introduced by Neiswanger, Wang and Xing [19]. We derive the asymptotic expansion of the mean integrated squared error for the full data posterior estimator and investigate the properties of asymptotically optimal bandwidth parameters. Our analysis demonstrates that partitioning data into subsets req...
متن کاملBayesian Bandwidth Selection in Nonparametric Time - Varying Coefficient Models
Bandwidth plays an important role in determining the performance of local linear estimators. In this paper, we propose a Bayesian approach to bandwidth selection for local linear estimation of time–varying coefficient time series models, where the errors are assumed to follow the Gaussian kernel error density. A Markov chain Monte Carlo algorithm is presented to simultaneously estimate the band...
متن کاملA Validation Test Naive Bayesian Classification Algorithm and Probit Regression as Prediction Models for Managerial Overconfidence in Iran's Capital Market
Corporate directors are influenced by overconfidence, which is one of the personality traits of individuals; it may take irrational decisions that will have a significant impact on the company's performance in the long run. The purpose of this paper is to validate and compare the Naive Bayesian Classification algorithm and probit regression in the prediction of Management's overconfident at pre...
متن کاملA Bayesian mixture model for classification of certain and uncertain data
There are different types of classification methods for classifying the certain data. All the time the value of the variables is not certain and they may belong to the interval that is called uncertain data. In recent years, by assuming the distribution of the uncertain data is normal, there are several estimation for the mean and variance of this distribution. In this paper, we co...
متن کامل